Semi-Supervised Learning to Extract Attribute-Value Pairs from Product Descriptions on the Web

نویسندگان

  • Katharina Probst
  • Rayid Ghani
  • Marko Krema
  • Andy Fano
  • Yan Liu
چکیده

We describe an approach to extract attribute-value pairs from product descriptions on the Web. The goal is to augment product databases by representing each product as a set of such attribute-value pairs. Such a representation is useful for a variety of tasks where treating the product as a set of attribute-value pairs is more useful than as an atomic entity. Examples include product recommendations, comparison of single products or complete offerings, and demand forecasting. We formulate the extraction as a classification problem and use Naı̈ve Bayes combined with a multi-view semi-supervised algorithm (co-EM). The extraction system requires very little initial user supervision: using unlabeled data, it automatically extracts an initial seed list that serves as training data for the classification algorithm. In addition to the automatically extracted training data, the co-EM algorithm uses the unlabeled data to extract product attributes and values. Finally, the extracted attributes and values are linked to form pairs using dependency information and co-location scores. We present promising results on Web product descriptions in two categories of sporting goods products.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting and Using Attribute-Value Pairs from Product Descriptions on the Web

We describe an approach to extract attribute-value pairs from product descriptions in order to augment product databases by representing each product as a set of attribute-value pairs. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. We formulate the extraction task as a classification prob...

متن کامل

Semi-Supervised Learning of Attribute-Value Pairs from Product Descriptions

We describe an approach to extract attribute-value pairs from product descriptions. This allows us to represent products as sets of such attribute-value pairs to augment product databases. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include product recomme...

متن کامل

Extracting A ribute-Value Pairs from Product Specifications on the Web

Comparison shopping portals integrate product o ers from large numbers of e-shops in order to support consumers in their buying decisions. Product o ers often consist of a title and a free-text product description, both describing product attributes that are considered relevant by the speci c vendor. In addition, product o ers might contain structured or semi-structured product speci cations in...

متن کامل

DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web

The web is a rich resource of structured data. There has been an increasing interest in using web structured data for many applications such as data integration, web search and question answering. In this paper, we present DEXTER, a system to find product sites on the web, and detect and extract product specifications from them. Since product specifications exist in multiple product sites, our ...

متن کامل

Towards 'Interactive' Active Learning in Multi-view Feature Sets for Information Extraction

Research in multi-view active learning has typically focused on algorithms for selecting the next example to label. This is often at the cost of lengthy wait-times for the user between each query iteration. We deal with a real-world information extraction task, extracting attribute-value pairs from product descriptions, where the learning system needs to be interactive and the users time needs ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006